# Tutorial V: Deep models

<p>
Bern Winter School on Machine Learning, 27-31 January 2020<br>
Prepared by Mykhailo Vladymyrov.
</p>

This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

In this session we will use the pretrained Inception model to build own image classifier. We will aslo learn how to save our trained models.

## 1. Load necessary libraries

In [0]:
# if using google colab
%tensorflow_version 2.x

In [0]:
import sys
import os

import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipyd
import tensorflow as tf
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants

from PIL import Image

# We'll tell matplotlib to inline any drawn figures like so:
%matplotlib inline
plt.style.use('ggplot')

from IPython.core.display import HTML
HTML("""<style> .rendered_html code { 
    padding: 2px 5px;
    color: #0000aa;
    background-color: #cccccc;
} </style>""")

physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

### Download libraries

In [0]:
p = tf.keras.utils.get_file('./material.tgz', 'https://scits-training.unibe.ch/data/tut_files/material.tgz')
!mv {p} .
!tar -xvzf material.tgz > /dev/null  2>&1

In [0]:
from utils import gr_disp
from utils import inception

## 2. Convert the Inception model to TF2 format

Let's load the graph definition and inspect the graph


In [0]:
def load_graph_def(file_path, use_GPU=True):
    with tf.compat.v1.gfile.GFile(file_path, "rb") as f:
        graph_def = tf.compat.v1.GraphDef()
        graph_def_str = f.read()
        if use_GPU:
            graph_def_str = graph_def_str.replace(b'/cpu:0', b'/gpu:0')
        graph_def.ParseFromString(graph_def_str)
    return graph_def

In [0]:
gd = load_graph_def('inception/tensorflow_inception_graph.pb')

In [0]:
gr_disp.show(gd)

helper function for model conversion:

In [0]:
def convert_model(file_path, save_path, io_tensors, use_GPU=True):
    builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(save_path)
  
    graph_def = load_graph_def(file_path, use_GPU)
  
    sigs = {}
  
    with tf.compat.v1.Session(graph=tf.compat.v1.Graph()) as sess:
        # name="" is important to ensure we don't get spurious prefixing
        tf.compat.v1.import_graph_def(graph_def, name="")
        g = tf.compat.v1.get_default_graph()
        inp = g.get_tensor_by_name(io_tensors[0])
        out = g.get_tensor_by_name(io_tensors[1])
  
        sigs[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY] = \
            tf.compat.v1.saved_model.signature_def_utils.predict_signature_def(
                {"in": inp}, {"out": out})
  
        builder.add_meta_graph_and_variables(sess,
                                            [tag_constants.SERVING],
                                            signature_def_map=sigs)
  
    builder.save()
    return graph_def

In [0]:
inc_path = 'inception/tensorflow_inception_graph.pb'
inc_path2 = 'inception/saved_tf2'
inc_path2_hbn = 'inception/saved_tf2_hbn'

In [0]:
!rm -rf {inc_path2}
!rm -rf {inc_path2_hbn}

Save model for prediction with specified output:

In [0]:
gd_full = convert_model(inc_path, inc_path2, ['input:0', 'output:0'])  # original model output
gd_hbn = convert_model(inc_path, inc_path2_hbn, ['input:0', 'head0_bottleneck/reshape:0'])  # head0_bottleneck as model output, for our problem

We now can use the model for prediction (similar to what we saw in last session).

In [0]:
mod = tf.saved_model.load('inception/saved_tf2')

In a model we can also inspect the graph operations and tensors:

In [0]:
op_names = [op.name for op in mod.graph.get_operations()]
#for n in op_names: print(n)

In [0]:
mod.graph.get_tensor_by_name('head0_bottleneck:0')

## 3. Create the graph with regressor

Here we create a keras layer which processes input with the inception model:

In [0]:
class InceptionCut(tf.keras.layers.Layer):
    def __init__(self, output_dim, **kwargs):
        self.mod = tf.saved_model.load('inception/saved_tf2_hbn')
        self.func = self.mod.signatures["serving_default"]
        self.output_dim = self.func.outputs[0].shape.as_list()[1]
        super(InceptionCut, self).__init__(**kwargs)

    def build(self, input_shape):
        super(InceptionCut, self).build(input_shape)  # Be sure to call this at the end

    def call(self, x):
        return self.func(x)['out']

    def compute_output_shape(self, input_shape):
        shape_a = input_shape[0]
        return (shape_a[0], self.output_dim)

And build a keras model using it:

In [0]:
model = tf.keras.models.Sequential([
                                    InceptionCut(-1),
                                    tf.keras.layers.Dense(512, activation='sigmoid'),
                                    tf.keras.layers.Dense(2, activation='softmax')
                                    ])

model.compile(optimizer=tf.keras.optimizers.Adam(0.0005,) ,
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [0]:
model.build(input_shape=(None,256,256,3))  # neede to initialize model
model.summary()

## 4. Dataset

The Inception network is trained on natural images: thigs we see around everyday, like sky, flowers, animals, building, cars.
It builds an hierarchy of features, to describe what it sees. 
This features can be used to train fast on different classes of objects. E.g. [here](https://www.tensorflow.org/tutorials/image_retraining) are more examples on transfer learning.

Here you will see that these features can be even used to detect thngs very different from natural images. Namely we will try to use it to distinguish German text from Italian. We will use 100 samples, taken from 5 German and 5 Italian books, 10 samples each.

In [0]:
text_label = ['German', 'Italian']

In [0]:
labels0 = []
images0 = []
labels1 = []
images1 = []

#German
for book in range(1,6):
    for sample in range(1,11):
        img = plt.imread('ML3/de/%d_%d.jpg'%(book, sample))
        assert(img.shape[0]>=256 and img.shape[1]>=256 and len(img.shape)==3)
        images0.append(inception.prepare_training_img(img))
        labels0.append([1,0])
for book in range(1,6):
    for sample in range(1,11):
        img = plt.imread('ML3/it/%d_%d.jpg'%(book, sample))
        assert(img.shape[0]>=256 and img.shape[1]>=256 and len(img.shape)==3)
        images1.append(inception.prepare_training_img(img))
        labels1.append([0,1])
        
idx = np.random.permutation(len(labels0))
labels0 = np.array(labels0)[idx]
images0 = np.array(images0)[idx]
labels1 = np.array(labels1)[idx]
images1 = np.array(images1)[idx]

Lets see a sample:

In [0]:
_, axs = plt.subplots(1, 2, figsize=(10,10))
img_d = inception.training_img_to_display(images0[25])
axs[0].imshow(img_d)
axs[0].grid(False)
img_d = inception.training_img_to_display(images1[25])
axs[1].imshow(img_d)
axs[1].grid(False)
plt.show()

## 5. Training

The training is similar to what we we saw previously.

Since Inception model is big, this will take a while, even we use GPUs (one T4 / 2 users). On your laptop CPU this would probably take ~15 times longer. And we are not training the whole Inception! We have just small thing on top + a very small dataset!

In [0]:
#We will take 80% from each for training and 20 for validation
n_half = images0.shape[0]
n_train_half = n_half*80//100
n_train = n_train_half*2

x_train = np.concatenate([images0[:n_train_half], images1[:n_train_half]])
y_train = np.concatenate([labels0[:n_train_half], labels1[:n_train_half]])

x_valid = np.concatenate([images0[n_train_half:], images1[n_train_half:]])
y_valid = np.concatenate([labels0[n_train_half:], labels1[n_train_half:]])

We will use callback to save checkpoints on each iteration of training. They contain values of trainable variables.

In [0]:
save_path = 'save/text_{epoch}.ckpt'
save_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_path,
                                                   save_weights_only=True,
                                                   save_freq=len(x_train)//10*25)

hist = model.fit(x_train, y_train,
                 epochs=150, batch_size=10, 
                 validation_data=[x_valid, y_valid],
                 callbacks=[save_callback])

In [0]:
fig, axs = plt.subplots(1, 2, figsize=(10,5))
axs[0].plot(hist.epoch, hist.history['loss'])
axs[0].plot(hist.epoch, hist.history['val_loss'])
axs[0].legend(('training loss', 'validation loss'), loc='lower right')
axs[1].plot(hist.epoch, hist.history['accuracy'])
axs[1].plot(hist.epoch, hist.history['val_accuracy'])

axs[1].legend(('training accuracy', 'validation accuracy'), loc='lower right')
plt.show()

We see that training accuracy hits 100% quickly. Why do you think it happens? Consider that loss keeps decreasing.
Also on such a small dataset our model overfits.

## 6. Load trained variables

If we have the model already created we can easily load the saved training variables values from a checkpoint:

In [0]:
#in the beginning:
model.load_weights('save/text_1.ckpt')
model.evaluate(images1[:1],  labels1[:1], verbose=2)

#in the end:
model.load_weights('save/text_150.ckpt')
model.evaluate(images1[:1],  labels1[:1], verbose=2)

## 7. Saving for inference.

In tf2 it's easy to save a model for inference:

In [0]:
tf.saved_model.save(model, "inference_model/")

## 8. Inference

In [0]:
mod = tf.saved_model.load('inference_model')
func = mod.signatures["serving_default"]

In [0]:
output_name = model.output_names[0]  # single output
print(output_name)

In [0]:
res = func(tf.constant(images1[:1]))[output_name]
print(res)

Or we can make a nice wrapper:

In [0]:
class Inferer:
  def __init__(self, model_path, output_name):
    self.mod = tf.saved_model.load(model_path)
    self.func = self.mod.signatures["serving_default"]
    self.output_name = output_name
    self.class_names = np.array(['german', 'italian'])
    self.max_len = 64

  def infere_class_batch(self, inputs):
    probabilities = self.func(tf.constant(inputs))[self.output_name].numpy()
    classes = np.argmax(probabilities, axis=1)
    probs = probabilities[np.arange(len(classes)), classes]
    return classes, probs

  def infere_class(self, inputs):
    n = len(inputs)
    if n > self.max_len:
      classes = []
      probs = []
      for i in range( (n+self.max_len-1) // self.max_len):
        batch = inputs[i* self.max_len : (i+1)* self.max_len]
        batch_classes, batch_probs = self.infere_class_batch(batch)
        classes.append(batch_classes)
        probs.append(batch_probs)
      classes = np.concatenate(classes)
      probs = np.concatenate(probs)
    else:
      classes, probs = self.infere_class_batch(inputs)

    return classes, probs

  def infere(self, inputs, prob=False):
    classes, probs = self.infere_class(inputs)
    cn = self.class_names[classes]
    return (cn, probs) if prob else cn

In [0]:
inf = Inferer('inference_model', output_name)

In [0]:
inf.infere(images0)

In [0]:
images_all = np.concatenate([images0, images1])

In [0]:
inf.infere(images_all, prob=True) # ouput class confidence probability

## 9. Improving the results

Often, as in this sample we don't have anough labeled data in hand. We need to use it as efficient as possible.
One way to do it is to aply training data augmentation: we can slightly distort it, e.g. rescale, to effectively multiply the dataset.

We will generate rescaled images, minimum - to have smaller dimension equal 256, maximum - 130%. Let's define a function which will do this job:

In [0]:
def get_random_scaled_img(file, minsize = 256, scalemax=1.3):
    im = Image.open(file)
    w, h = im.size
    # get minimal possible size
    scalemin =float(minsize) / min(w,h)
    # get a rescale factor from a uniform distribution.
    scale = scalemin + np.random.rand() * (scalemax - scalemin)
    w1 = int(max(minsize, scale*w))
    h1 = int(max(minsize, scale*h))
    
    #rescale with smoothing
    im1 = im.resize((w1,h1), Image.ANTIALIAS)
    #get numpy array from the PIL Image
    img_arr = np.array(im1.convert('RGB'))

    #crop to 256x256, preventing further resize by prepare_training_img
    r = (img_arr.shape[0] - minsize) // 2
    c = (img_arr.shape[1] - minsize) // 2
    img_arr = img_arr[r:r+minsize,c:c+minsize]

    return img_arr

Lets check rescaled images.

In [0]:
n_smpl=2
scaled_imgs=[get_random_scaled_img('ML3/de/%d_%d.jpg'%(1, 1)) for i in range(n_smpl**2)]
fig, ax = plt.subplots(n_smpl, n_smpl, figsize=(n_smpl*4, n_smpl*4))
for row in range(n_smpl):
    for col in range(n_smpl):
        ax[col, row].imshow(scaled_imgs[row*n_smpl+col])
        ax[col, row].grid(False)

Read again images, now generating 5 rescaled from each one.

In [0]:
labels0 = []
images0 = []
labels1 = []
images1 = []

mult = 5
#German
for book in range(1,6):
    for sample in range(1,11):
        for itr in range(mult):
            img = get_random_scaled_img('ML3/de/%d_%d.jpg'%(book, sample))
            assert(img.shape[0]>=256 and img.shape[1]>=256 and len(img.shape)==3)
            images0.append(inception.prepare_training_img(img))
            labels0.append([1,0])
#Italian
for book in range(1,6):
    for sample in range(1,11):
        for itr in range(mult):
            img = get_random_scaled_img('ML3/it/%d_%d.jpg'%(book, sample))
            assert(img.shape[0]>=256 and img.shape[1]>=256 and len(img.shape)==3)
            images1.append(inception.prepare_training_img(img))
            labels1.append([0,1])
        
idx = np.random.permutation(len(labels0))
labels0 = np.array(labels0)[idx]
images0 = np.array(images0)[idx]
labels1 = np.array(labels1)[idx]
images1 = np.array(images1)[idx]

And finally do training again, same way. Just now we change the number of epochs: before we had 150, but now that we have 5 times more training data we'll do 60. While 60 > 150/5, it looks like it takes a bit more time to converge.
We use the same graph as before, `g2`, the one we can train.

In [0]:
#We will take 80% from each for training and 20 for validation
n_half = images0.shape[0]
n_train_half = n_half*80//100
n_train = n_train_half*2

x_train = np.concatenate([images0[:n_train_half], images1[:n_train_half]])
y_train = np.concatenate([labels0[:n_train_half], labels1[:n_train_half]])

x_valid = np.concatenate([images0[n_train_half:], images1[n_train_half:]])
y_valid = np.concatenate([labels0[n_train_half:], labels1[n_train_half:]])

In [0]:
model_aug = tf.keras.models.Sequential([
                                    InceptionCut(-1),
                                    tf.keras.layers.Dense(512, activation='sigmoid'),
                                    tf.keras.layers.Dense(2, activation='softmax')
                                    ])

model_aug.compile(optimizer=tf.keras.optimizers.Adam(0.0005,) ,
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [0]:
save_path = 'save/text_augmented_{epoch}.ckpt'
save_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_path, save_weights_only=True)

hist = model_aug.fit(x_train, y_train,
                 epochs=60, batch_size=10, 
                 validation_data=[x_valid, y_valid],
                 callbacks=[save_callback])

In [0]:
fig, axs = plt.subplots(1, 2, figsize=(10,5))
axs[0].plot(hist.epoch, hist.history['loss'])
axs[0].plot(hist.epoch, hist.history['val_loss'])
axs[0].legend(('training loss', 'validation loss'), loc='lower right')
axs[1].plot(hist.epoch, hist.history['accuracy'])
axs[1].plot(hist.epoch, hist.history['val_accuracy'])

axs[1].legend(('training accuracy', 'validation accuracy'), loc='lower right')
plt.show()

In [0]:
tf.saved_model.save(model_aug, "inference_model_aug/")

We had a REEEALLY small dataset for such a complicated task. Does it really generalize? mb it just memorizes all the images we fed into it? Lets perform a test. `w1.PNG` and `w2.PNG` are text screenshots from wikipedia in [Italian](https://it.wikipedia.org/wiki/Apprendimento_automatico) and [German](https://de.wikipedia.org/wiki/Maschinelles_Lernen).

In [0]:
# load images
im_wiki_1 = plt.imread('ML3/w1.jpg')
im_wiki_2 = plt.imread('ML3/w2.jpg')

# crop/covert for proper color range
im_wiki_1_p = inception.prepare_training_img(im_wiki_1)[np.newaxis]
im_wiki_2_p = inception.prepare_training_img(im_wiki_2)[np.newaxis]

In [0]:
inf = Inferer('inference_model_aug', output_name)

In [0]:
class_name, prob = inf.infere(np.concatenate([im_wiki_1_p, im_wiki_2_p]), prob=True)


print('probabilities for w1:', prob[0], 'detected language:', class_name[0])
print('probabilities for w2:', prob[1], 'detected language:', class_name[1])

# Show image crops
plt.imshow( inception.training_img_to_display(im_wiki_1_p[0]))
plt.show()
plt.imshow( inception.training_img_to_display(im_wiki_2_p[0]))
plt.show()



## 12. Excercise 1

There is a serious problem in the example above: the training and validation datasets are not independent. We generated 5 randomly scaled images from each initial image. With high probability from 5 images (generated from same initial one!) some will end up im the training and some in validation datasets. Since they are generated from the same initial ones, they are not fully independent. This compromises evaluation of model performance, leading to an overestimate of the performance.

1. Modify the generation of the training and validation datasets to fulfil requirenment of independance.
2. Check how validation accuracy and loss changes

## 13. Excercise 2

(Hope we have time left....)
Test the performance of model trained on NOT rescaled images, on the wiki screenshots.

In [0]:
# copy the above code here
# load the original model

## 14. Homework (3 options)

### 14.1 Improve training set

So far we scaled images as a whole. 
- Try to scale differently in $x$ and $y$ direction.
- Check how it affects performace.
- Which else transformation would make sence for the text data?
- Get hands dirty.

### 14.2 Try to use lower layers' outputs from Inception to build the classifier.

So far we used last output of Inception.
- Look at the Inception more carefully.
- Inspect the size of the data array at different layers.
- Since inside you have 3D data (2D image * features at each position) you will need to flatten it. Look how this is done in last layers (`head0`). Alternatively you can create convolutional layers.
- Ask, google it, and get your hands dirty!

### 14.3 Classify 3 languages.

So far we tried two languages.
- Create 50 crops of text in another language (better use 5 sources with different fonts, otherwise you risk to learn font, not language), images size > 300 x 300 (to allow scaling).
- Upload them to the `ML3` directory inside of a new directory `xx`.
- Repeat everything with 3 classes.
- Think of the case when this approach won't work.
- Get hands dirty!!!